# COMPSCI 389: Introduction to Machine Learning
# Topic 0.1: Python and Jupyter Notebooks

In this course we will use Python, and specifically Jupyter Notebooks like this one. This notebook provides a brief introduction to Python and Jupyter notebooks.

## Python

Python is a high-level programming language.
- It is *interpreted*, meaning that it is not compiled into executables, but rather executed directly from the source code by a program called an "interpreter".
- It is a popular language for machine learning.
- It is *very* slow in comparison to compiled languages.
 - Many python libraries call C++ code, making them efficient.
 - Efficient use of python leverages these library calls for anything compute-intesive.
 - Even writing careful python, I've found that students (BS-PhD) produce Python code that is around 6x to 100x times slower than corresponding C++ code.
- Python code is typically stored in `.py` files.
- Common integrated development environments (IDEs, programs for writing and running python files) include Visual Studio Code (VSCode) and PyCharm.

## Jupyter Notebooks

Jupyter notebooks were previously called iPython notebooks, when they were restricted to Python. They have since been extended to work with many different programming languages (within a single file!) and were thus renamed to Jupyter notebooks. However, the file type retains the old name: `.ipynb` for "IPYthon NoteBook".

This document is a Jupyter notebook. It consists of a vertical stack of "cells". Each cell has a type, including "markdown" and "Python".

You can edit a cell by double clicking on it. If you double click on this cell, you should see the raw markdown (a language for displaying text). You should see the type of the cell in the bottom right - in this case it should say "markdown" in the bottom right of this cell.

If you click on the bottom right of the cell where it lists the cell type, you can change the cell to a different type. We will mainly use:
- Markdown cells for displaying text.
- Python cells for displaying *and running* code.

When you have finished editing a markdown cell, you can click the check mark in the top right of the cell to stop editing it. This renders the cell, and we say the cell is "run". You can also hit `ctrl+enter` to run any cell or `shift+enter` to run any cell and move to the next cell. If you hit `shift+enter` on the last cell, it will automatically create a new cell after the current one.

## Markdown Cells

(Edit this cell to see the underlying formatting.)

In markdown cells you can have **bold** or *italic* text. You can have inline code like `this` or code blocks like this:
```
print("Hello World!")
```
You can have block quotes like this:
> I am not a crook.

You can have [links](https://google.com).

You can have comments like this: 

You can have images like this: (commented out!)


If you want to control the width so it's not too big, you can have images like this (which sets the width to 400 pixels):
"Wikipedia

You can make horizontal lines like this:

---

You can change the font color like this.

You can have headers
# Header 1
## Header 2
### Header 3
#### Header 4

You can make lists like this:
- Dog
- Cat
 - Tabby
 - Calico
- Mouse
Or like this:
1. Apple
2. Orange
3. Pear

You can include math like this $\pi \neq \int_{-\infty}^\infty x^2 \, \text{d}x$. The language used to display math is called LaTeX. Most computer science papers are written using LaTeX. You can find a free (commonly used) LaTeX editor at https://www.overleaf.com/. VSCode only works with a restricted set of features from LaTeX, but enough to write basic equations.

Here's another example that creates an "align" block in LaTeX, which aligns the & character on each line of the equation.
$$
\begin{align}
a =& b + c \\
x =& y - z.
\end{align}
$$

[And much much more](https://www.markdownguide.org/basic-syntax/)!


## Python Cells

Python cells contain Python code, but are not automatically run. Python cells can be run by clicking the triangle "run" button in the top left of the cell or by using the `ctrl+enter` (run cell) or `shift+enter` (run cell and move to the next cell) commands.

The first time that you run a cell, VSCode may prompt you for two things.

1. "Do you trust the authors of the files in this workspace?" You are running the program in the notebook, so ensure you trust the source of the notebook.
2. What "Kernel" would you like to use? This is asking what installation of python to use. Depending on your operating system, the current selection should be visible somewhere near the top-right or bottom-right of VSCode. It should say, for example, "Python 3.1.1.7". If you click this text, you can select different versions of python (or different virtual environments) to use.

Once you have trusted the workspace and selected a python kernel, the python cell should run, showing the output below the code cell.

In [26]:
print("Hello World")
print("We can ", "print many things! ", 42)

Hello World
We can print many things! 42


## Python Basics

In Python:
- Object types do not need to be specified.
- Whitespace (tabs) are used to denote when if-statements, loops, function definitions, etc. end.
- Packages are installed using commands like this, run in the command line:

> pip install numpy

Note that this will install numpy into your default Python installation. If you're using a different Python kernel in VSCode, you need to ensure that you install numpy (or the desired library) for that kernel!

Here is an example:

In [27]:
a = 10 # Comments come after '#' symbols. Notice we didn't say the type of a
b = 20 # Semicolons are optional after lines, and usually not included!

if a < b: 
 print("a is less than b")
 print("We got here 1.") # This is within the a
```

In Java you write:
```
import java.util.Scanner
```

In Python you write:
```
import math
```

Or, if you want one specific function:
```
from math import sqrt
```

Or, if you want to use the math library but don't want to type out "math" every time, you can give a shorter name:
```
import math as mth
```

a is less than b
We got here 1.


In [36]:
import math
print(math.sqrt(16))

4.0


In [37]:
from math import sqrt

print(sqrt(16))

4.0


In [38]:
import math as mth

print(mth.sqrt(16)); # This is a silly exmaple, but for long library names this can save a lot of space.

4.0


In [39]:
# Like any other code, import statements from prior cells persist!
print(sqrt(16)) # This uses from math import sqrt

4.0


## Numpy

Numpy is a common library used for numerical computing. It is mainly used for it's `ndarray` object, which represents multi-dimensional arrays.

Here's an example:

In [40]:
import numpy as np

# Creating a NumPy array
arr = np.array([1, 2, 3, 4, 5])

# Basic operations
arr_plus_10 = arr + 10 # Add 10 to each element
arr_squared = arr ** 2 # Square each element

# Displaying the results
print("Original array:", arr)
print("Array plus 10:", arr_plus_10)
print("Array squared:", arr_squared)

# Applying a mathematical function
mean_value = np.mean(arr)
print("Mean of the array:", mean_value)

Original array: [1 2 3 4 5]
Array plus 10: [11 12 13 14 15]
Array squared: [ 1 4 9 16 25]
Mean of the array: 3.0


### Indexing Arrays

Python allows you to specify sub-arrays in a convenient manner. In the example below we create a 2-dimensional array (a matrix). We can access this with `arr_2d[i,j]` to get the element in the i'th row and j'th column. We can get sub-arrays by specifying ranges of values for i and h. 

Note that `:` means "all incices".

Here are some examples:

In [41]:
# Creating a 2D NumPy array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

selected_columns = arr_2d[:, 1:3] # Get all rows, and columns 1 to 3

print("Original Array:\n", arr_2d)
print("Selected Columns:\n", selected_columns)


Original Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Selected Columns:
 [[2 3]
 [5 6]
 [8 9]]


Whoa! Notice that columns 1:3 resulted in only two columns! What's going on?

The notation `i:j` means to take columns `i` through `j-1`. This convention makes it easier to reference elements when you know the length of an array. Using `0:n` for a length `n` array will give all elements, `0` to `n-1`. In the example above `1:3` includes the middle and last columns (indices 1 and 2), but not the first (index 0).

We can also index backwards from the end of an array:

In [42]:
# Create a 1D NumPy array
arr_1d = np.array([1, 2, 3, 4, 5])

# Print the last element
print(f"The last element is {arr_1d[-1]}.")

# Using [:-1] indexing to select all elements except the last one
all_but_last = arr_1d[:-1]

print("Original Array:", arr_1d)
print("All but last element:", all_but_last)


The last element is 5.
Original Array: [1 2 3 4 5]
All but last element: [1 2 3 4]


We can also use an array of Booleans to index into an array:

In [43]:
import numpy as np

# Creating a NumPy array
arr = np.array([1, 5, 10, 15, 20, 25, 30])

# Define the threshold
threshold = 15

# Get indices above threshold
indices = arr > threshold # This is an array of Booleans
print("indices = ", indices)

# Get the corresponding values
above_threshold = arr[arr > threshold]
print("above_threshold = ", above_threshold)

indices = [False False False False True True True]
above_threshold = [20 25 30]


## Types

We can get the type of an object using the `type` function.

In [44]:
# Print the type of a - an integer from the start.
print(type(a))

# Notice that here display is nicer:
display(type(a))




int

In [45]:
display(type(indices)) # This tells us its a numpy ndarray, but not what is in the array.




In [47]:
display(indices.dtype) # This tells us what is inside the numpy array.

dtype('bool')